本文考虑了有损神经图像压缩(NIC)的问题。当前的最新方法(SOTA)方法采用近似量化噪声的后部均匀的后方,单样本估计量近似于证据下限(ELBO)的梯度。在本文中,我们建议用多个样本重要性加权自动编码器(IWAE)目标训练NIC,该目标比Elbo更紧,并随着样本量的增加而收敛至对数的可能性。首先,我们确定NIC的均匀后验具有特殊的特性,这会影响IWAE目标的Pathiswise和得分函数估计器的方差和偏差。此外,从梯度差异的角度来看,我们提供了有关NIC中通常采用的技巧的见解。基于这些分析,我们进一步提出了多样本NIC(MS-NIC),这是NIC的IWAE靶标。实验结果表明,它改善了SOTA NIC方法。我们的MS-NIC是插件,可以轻松扩展到其他神经压缩任务。
translated by 谷歌翻译
在本文中,我们考虑了神经视频压缩(NVC)中位分配的问题。由于帧参考结构,使用相同的R-D(速率)权衡参数$ \ lambda $的当前NVC方法是次优的,这带来了位分配的需求。与以前基于启发式和经验R-D模型的方法不同,我们建议通过基于梯度的优化解决此问题。具体而言,我们首先提出了一种基于半损坏的变异推理(SAVI)的连续位实现方法。然后,我们通过更改SAVI目标,使用迭代优化提出了一个像素级隐式分配方法。此外,我们基于NVC的可区分特征得出了精确的R-D模型。我们通过使用精确的R-D模型证明其等效性与位分配的等效性来展示我们的方法的最佳性。实验结果表明,我们的方法显着改善了NVC方法,并且胜过现有的位分配方法。我们的方法是所有可区分NVC方法的插件,并且可以直接在现有的预训练模型上采用。
translated by 谷歌翻译
神经图像压缩(NIC)的表现优于传统图像编解码器(R-D)性能。但是,它通常需要R-D曲线上每个点的专用编码器对,这极大地阻碍了其实际部署。尽管最近的一些作品通过有条件的编码实现了比特率控制,但它们在训练过程中施加了强大的先验,并提供了有限的灵活性。在本文中,我们提出了代码编辑,这是一种基于半损坏的推理和自适应量化的NIC的高度灵活的编码方法。我们的工作是可变比特率NIC的新范式。此外,实验结果表明,我们的方法超过了现有的可变速率方法,并通过单个解码器实现了ROI编码和多功能权衡。
translated by 谷歌翻译
基于单个草图图像重建3D形状是由于稀疏,不规则的草图和常规,密集的3D形状之间的较大域间隙而具有挑战性的。现有的作品尝试采用从草图提取的全局功能来直接预测3D坐标,但通常会遭受失去对输入草图不忠心的细节。通过分析3D到2D投影过程,我们注意到表征2D点云分布的密度图(即,投影平面每个位置的点的概率)可以用作代理,以促进该代理重建过程。为此,我们首先通过图像翻译网络将草图翻译成一个更有信息的2D表示,可用于生成密度映射。接下来,通过两个阶段的概率采样过程重建一个3D点云:首先通过对密度映射进行采样,首先恢复2D点(即X和Y坐标);然后通过在每个2D点确定的射线处采样深度值来预测深度​​(即Z坐标)。进行了广泛的实验,定量和定性结果都表明,我们提出的方法显着优于其他基线方法。
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
Hybrid unmanned aerial vehicles (UAVs) integrate the efficient forward flight of fixed-wing and vertical takeoff and landing (VTOL) capabilities of multicopter UAVs. This paper presents the modeling, control and simulation of a new type of hybrid micro-small UAVs, coined as lifting-wing quadcopters. The airframe orientation of the lifting wing needs to tilt a specific angle often within $ 45$ degrees, neither nearly $ 90$ nor approximately $ 0$ degrees. Compared with some convertiplane and tail-sitter UAVs, the lifting-wing quadcopter has a highly reliable structure, robust wind resistance, low cruise speed and reliable transition flight, making it potential to work fully-autonomous outdoor or some confined airspace indoor. In the modeling part, forces and moments generated by both lifting wing and rotors are considered. Based on the established model, a unified controller for the full flight phase is designed. The controller has the capability of uniformly treating the hovering and forward flight, and enables a continuous transition between two modes, depending on the velocity command. What is more, by taking rotor thrust and aerodynamic force under consideration simultaneously, a control allocation based on optimization is utilized to realize cooperative control for energy saving. Finally, comprehensive Hardware-In-the-Loop (HIL) simulations are performed to verify the advantages of the designed aircraft and the proposed controller.
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译